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CAPTIONING SYSTEM 



The present invention relates to a system and method and 
parts thereof for providing captions for audio or video 
or multi-media presentations. The invention has 
5 particular though not exclusive relevance to the 

provision of such a captioning system to facilitate the 
enjoyment of the audio, video or multimedia presentation 
by people with sensory disabilities. 

10 A significant proportion of the population with hearing 

difficulties benefit from captions (in the form of text) 
on video images such as TV broadcasts, video tapes, DVD 
and films. There are currently two types of captioning 
systems available for video images - on-screen caption 

15 systems and off-screen caption systems. In on-screen 

caption systems, the caption text is displayed on-screen 
and it obscures part of the image. This presents a 
particular problem with cinema where there is a 
reluctance for this to happen with general audiences. 

20 In the off -screen caption system, the text is displayed 

on a separate screen. Whilst this overcomes some of the 
problems associated with the on-screen caption system, 
this solution adds additional cost and complexity and 
currently has had poor takeup in cinemas for this reason. 

25 

In addition to text captioning systems for people with 
hearing difficulties, there are also captioning systems 
which provide audio captions for people with impaired 
eyesight. In this type of audio captioning system, an 
30 audio description of what is being displayed is provided 

to the user in a similar way to the way in which sub- 
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titles are provided for the hard of hearing. 

One aim of the present invention is to provide an 
alternative captioning system for the hard of hearing or 
an alternative captioning system for those with impaired 
eyesight. The captioning system can also be used by 
those without impaired hearing or eyesight/ for example, 
to provide different language captions or the lyrics for 
songs . 

According to one aspect, the present invention provides 
a captioning system comprising: a caption store for 
storing one or more sets of captions each being 
associated with one or more presentations and each set 
comprising at least one caption for playout at different 
timings during the associated presentation; and a user 
device having: (i) a memory for receiving and storing 
at least one set of captions, for a presentation from the 
caption store; (ii) a receiver operable to receive 
synchronisation information defining the timing during 
the presentation at which each caption in the received 
set of captions is to be output to the user; and (iii) 
a caption output circuit operable to output to the 
associated user, the captions in the received set of 
captions at the timings defined by the synchronisation 
information. 

In one embodiment, the captions are text captions which 
are output to the user on a display associated with the 
user device. In another embodiment, the captions are 
audio signals which are output to the user as acoustic 
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signals via a loudspeaker or earphone. The captioning 
system can be used, for excunple in cinemas, to provide 
captions to people with sensory disabilities to 
facilitate their understanding and enjoyment of, for 
5 example, films or other multimedia presentations. 

The user device is pref ereJ^ly a portable hand-held device 
such as a mobile telephone or personal digital assistant,' 
as there are small and lightweight and most users have 
10 access to them. The use of such a portable ' computing 

device is also preferred since it is easy to adapt the 
device to operate in the above manner by providing the 
device with appropriate software. 

The caption store may be located in a remote server in 
which case the user device is preferably a mobile 
telephone (or a PDA having wireless connectivity) as this 
allows for the direct connection between the user device 
and the remote server. Alternatively, the caption store 
may be a kiosk at the venue at which the presentation is 
to be made, in which case the user can download the 
captions and synchronisation information when they 
arrive. Alternatively, the caption store may simply be 
a memozy card or smart-card which the user can insert 
into their, user device in order to obtain the set of 
captions for the presentation together with the 
synchronisation information. 

According to another aspect, the present invention 
30 provides a method of manufacturing a computer readable 

medium storing caption data and synchronisation data for 
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use in a captioning system, the method comprising: 
providing a computer readable medium; providing a set of 
captions that is associated with a presentation which 
comprises a plurality of captions for playout at 
different timings during the associated presentation; 
providing synchronisation information defining the timing 
during the presentation at which each caption in the set 
of captions is to be output to a user; receiving a 
computer readable medium; recording con^uter readable 
data defining said set of captions and said 
synchronisation information on said computer readcd>le 
medium; and out putting the computer readable medium 
having the recorded caption and synchronisation data 
thereon. 

Exemplary embodiments of the present invention will now. 
be described with reference to the accompanying drawings , 
in which: 

Figure 1 is a schematic overview of a captioning system 
embodying the present invention; 

Figure 2a is a schematic block diagram illustrating the 
main components of a user telephone that is used in the 
captioning system shown in Figure 1; 

Figure 2b is a table representing the captions in a 
caption file downloaded to the telephone shown in Figure 
2 a from the remote web server shown in Figure 1; 

Figure 2c is a representation of a synchronisation file 
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downloaded to the mobile telephone shown in Figure 2a 
from the remote web server shown in Figure 1; 

Figure 2d is a timing diagram illustrating the timing of 
5 synchronisation signals and illustrating timing windows 

during which the mobile telephone processes an audio 
signal from a microphone thereof; 

Figure 2e is a signal diagram illustrating an exemplary 
10 audio signal received by a microphone of the telephone 

shown in Figure 2a and the signature stream generated by 
a signature extractor forming part of the mobile 
telephone; 

15 Figure 2f illustrates an output from a correlator forming 

part of the mobile telephone shown in Figure 2a ^ which 
is used to synchronise the display of captions to the 
user with the film being watched; 

20 Figure 2g schematically illustrates a screen shot from 

the telephone illustrated in Figure 2a showing an excimple 
caption that is displayed to the user; 

Figure 3 is a schematic block diagram illustrating the 
25 main components of the remote web server forming part of 

the captioning system shown in Figure 1; 



30 



Figure 4 is a schematic block diagram illustrating, the 
main components of a portable user device of an 
alternative embodiment; and 
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Figure 5 is a schematic block diagram illustrating the 
main components of the remote server used with the 
portable user device shown in Figure 5. 



5 OVERVIEW 

Figure 1 schematically illustrates a captioning system 
for use in providing text captions on a number of user 
devices (two of which are shown and labelled 1-1 and 1-2) 
for a film being shown on a screen 3 within a cinema 5* 

10 The captioning system also includes a remote web "server 

7 which controls access by the user devices 1 to captions 
stored in a captions database 9. In particular, in this 
embodiment, the user device 1-1 is a mobile telephone 
which can connect to the remote web server 7 via a 

15 cellular communications base station 11, a switching 

centre 13 and the Internet 15 to download captions from 
the captions database 9. In this embodiment, the second 
user device 1-2 is a personal digital assistant (PDA) 
that does not have cellular telephone transceiver 

20 circuitry. This PDA 1-2 can, however, connect to the 

remote web server 7 via a computer 17 which can connect 
to the Internet 15. The computer 17 may be a home 
computer located in the user's home 19 and may typically 
include a docking station 21 for connecting the PDA 1-2 

25 with the computer 17. 

In this embodiment, the operation of the captioning 
system using the mobile telephone 1-1 is slightly 
different to the operation of the captioning system using 
30 the PDA 1-2. A brief description of the operation of the 

captioning system using these devices will now be given. 
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In this embodiment, the mobile telephone 1-1 operates to 
download the caption for the film to be viewed at the 
start of the film. It does this by capturing a portion 
of soundtrack from the beginning of the film, generated 
by speakers 23-1 and 23-2, which it processes to generate 
a signature that is characteristic of the audio segment. 
The mobile telephone 1-1 then transmits this signature 
to the remote web server 7. via the base station 11, 
switching station 13 and the Internet 15. The web server 
7 then identifies the film that is about to begin from 
the signature and retrieves the appropriate caption file 
together with an associated synchronisation, file which 
it transmits back to the mobile telephone 1-1 via the 
Internet 15, switching centre 13 and base station 11. 
After the caption file and the synchronisation file have 
been received by the mobile telephone 1-1, the connection 
with the base station 11 is terminated and the mobile 
telephone 1-1 generates and displays the appropriate 
captions to the user in synchronism with the film that 
is shown on the screen 3. In this embodiment, the 
synchronisation data in the synchronisation file 
downloaded from the remote web server 7 defines the 
estimated timing of subsequent audio segments within the 
film and the mobile telephone 1-1 synchronises the 
playout of the captions by processing the audio signal 
of the film and identifying the actual timing of those 
subsequent audio segments in the film. 

In this embodiment, the user of the PDA 1-2 downloads the 
caption for the film while they are at home 19 using 
their personal computer 17 in advance of the film being 
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shown. In particular, in this embodiment, the user types 
in the name of the film that they are going to see into 
the personal computer 17 and then sends this information 
to the remote web 7 server via the Internet 15. In 
response, the web server 7 retrieves the appropriate 
caption file and synchronisation file for the film which 
it downloads to the user's personal computer 17* The 
personal computer 17 then stores the caption file and the 
synchronisation file in the PDA 1-2 via the docking 
station 21, In this embodiment, the subsequisnt operation 
of the PDA 1-2 to synchronise the display of the captions 
to the user during the film is the same as the operation 
of the mobile telephone 1-1 and will not, therefore, be 
described again. 

MOBILE TELEPHONE 

A brief description has been given above of the way in 
which the mobile telephone 1-1 retrieves and subsequently 
plays out the captions for a film to a user. A more 
detailed description will now be given of the main 
components of the mobile telephone 1-1 which are shown 
in block form in Figure 2a. As shown, the mobile 
telephone 1-1 includes a microphone 41 for detecting the 
acoustic sound signal generated by the speakers 23 in the 
cinema 5 and for generating a corresponding electrical 
audio signal. The audio signal from the microphone 41 
is then filtered by a filter 43 to remove frequency 
components that are not of interest. The filtered audio 
signal is then converted into a digital, signal by the 
analogue to digital converter (ADC) 45 and then stored 
in an input buffer 47. The audio signal written into the . 
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input buffer 47 is then processed by a signature 
extractor 49 which processes the audio to extract a 
signature that is characteristic of the buffered audio. 
Various processing techniques can be used by the 
signature extractor 49 to extract this signature. For 
example/ the signature extractor may carry out the 
processing described in WO 02/11123 in the name of Shazam 
Entertainment Limited. In this system, a window of about 
15 seconds of audio is processed to identify a number of 
"fingerprints" along the audio string that are 
representative of the audio. These fingerprints together 
with timing information of when they occur within the 
audio string forms the above described signature. 

As shown in Figure 2a, the signature generated by the 
signature extractor is then output to an output buffer 
51 and then transmitted to the remote web server 7 via 
the antenna 53, a transmission circuit 55, a digital to 
analogue converter (DAC) 57 and a switch 59. 

As will be described in more detail below, the remote 
server 7 then processes the received signature to 
identify the film that is playing and to retrieve the 
appropriate caption file and synchronisation file for the 
film. These are then downloaded back to the mobile 
telephone 1-1 and passed, via the aerial 53, reception 
circuit 61 aiid analogue to digital converter 63 to a 
caption memory 65. Figure. 2b schematically illustrates 
the form of the caption file 67 downloaded from the 
remote web server 7. As shown, in this embodiment, the 
caption file 67 includes an ordered sequence of captions 
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(caption(l) to caption{N)) 69-1 to The caption 

file 67 also includes, for each caption, formatting 
information 71-1 to 71-N that defines the font, colour, 
etc. of the text to be displayed. The caption file 67 
5 also includes, for each caption, a time value tj to t^ 

which defines the time at which the caption should be 
output to the user relative to the start of the film. 
Finally, in this embodiment, the caption file 67 
includes, for each caption 69, a duration Ati to Atj, which 
10 defines the duration that the caption should b6 displayed 

to the user. 



Figure 2c schexoatically represents the data within the 
synchronisation file 73 which is used in this embodiment 

15 by the mobile telephone 1-1 to synchronise the display 

of the captions with the film. As shown, the 
synchronisation file 73 includes a number of signatures 
75-1 to 75 -M each having an associated time value ti^ to 
tw^ identifying the time at which the signature should 

20 occur within the audio of the film (again calculated from 

the beginning of the film). 



In this embodiment, the synchronisation file 73 is passed 
to a control unit 81 which controls the operation of the 

25 signature extracting unit 49 and a sliding correlator 83^ 

The control unit 81 also controls the position of the 
switch 59 so that after the caption and synchronisation 
files have been downloaded into the mobile telephone 1-1, 
and the mobile telephone 1-1 is trying to synchronise the 

30 output of the captions with the film, the signature 

stream generated by the signature extractor 49 is passed 
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to the sliding correlator 83 via the output buffer 51 and 
the switch 59. 

Initially, before the captions are output to the user, 
the mobile telephone 1-1 must synchronise with the film 
that is playing. This is achieved by operating the 
signature extractor 49 and the sliding correlator 83 in 
an acquisition mode, during which the signature extractor 
extracts signatures from the audio received at the 
microphone 41 which are then compared with the Signatures 
75 in the synchronisation file 73, until it identifies 
a match between the received audio from the film and the 
signatures 75 in the synchronisation file 73. This match 
identifies the current position within the film, which 
is used to identify the initial caption to be displayed 
to the user. At this point, the mobile telephone 1-1 
enters a tracking mode during which the signature 
extractor 49 only extracts signatures for the audio 
during predetermined time slots (or windows) within the 
film corresponding to when the mobile telephone 1-1 
expects to detect the next signature in the audio track 
of the film. This is illustrated in Figure 2d which 
shows a time line (representing the time line for the 
film) together with the timings tj^ to t^^ corresponding 
to when the mobile telephone 1-1 expects the signatures 
to occur within the audio track of the film. Figure 2d 
also shows a small time slot or window Wi to Wm around 
each of these time points, during which the signature 
extractor 49 processes the audio signal to generate a 
signature streeua which it outputs to the output buffer 
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The generation of the signature stream is illustrated in 
Figure 2e which shows a portion 77 of the audio track 
corresponding to one of the time windows Wj and the 
stream 79 of signatures generated by the signature 
extractor 49. In this embodiment ^ three signatures 
(signature (i), signature (i+1) and signature (i+2)) are 
generated for each processing window w. This is for 
illustration purposes only. In practice, many more or 
less signatures may be generated for each processing 
window w. Further, whilst in this embodiment the 
signatures are generated from non-overlapping subwindows 
of the processing window w, the signatures may also be 
generated from overlapping subwindows. The way in which 
this would be achieved will be well known to those 
skilled in the art and will not be described in any 
further detail. 

In this embodiment, between adjacent processing windows 
w, the control unit 51 controls the signature extractor 
49 so that it does not process the received audio. In 
this way, the processing performed by the signature 
extractor 49 can be kept to a minimum. 

During this tracking mode of operation, the sliding 
correlator 83 is operable to correlate the generated 
signature stream in output buffer 51 with the next 
signature 75 in the synchronisation file 73. This 
correlation generates a correlation plot such as that 
shown in Figure 2f for the window of audio being 
processed. As shown in Figure 2d, in this embodiment, 
the windows Wj are defined so that the expected timing of 
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the signature is in the middle of the window. This means 
that the mobile telephone 1-1 expects the peak output 
from the sliding correlator 83 to correspond to the 
middle of the processing window w. If the peak occurs 
earlier or later in the window then the caption output 
timing of the mobile telephone 1-1 must be adjusted to 
keep it in synchronism with the film* This is 
illustrated in Figure 2f which shows the expected time 
of the signature tg appearing in the middle of the window 
and the correlation peeJc occurring 5t seconds before the 
expected time. This means that the mobile telephone 1-1 
is slightly behind the film and the output timing of the 
subsequent captions must be brought forward to catch up 
with the film. This is achieved by passing the 5t value 
from the correlator 83 into a timing controller 85 which 
generates the timing signal for controlling the time at 
which the captions are played out to the user. As shown, 
the timing controller receives its timing reference from 
the mobile telephone clock 87. The generated timing 
signal is then passed to a caption display engine 89 
which uses the timing signal to index the caption file 
67 in order to retrieve the next caption 69 for display 
together with the associated duration information At and 
formatting information 71 which it then processes and 
outputs for display on the mobile telephone display 91 
via a frame buffer 93. The details of how the caption 
69 is generated and formatted are well known to those 
skilled in the art and will not be described in any 
further detail. 

Figure 2g illustrates the form of an example caption 
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which is output on the display 91. Figure 2g also shows 
in the right hand side 95 of the display 91 a number of 
user options that the user can activate by pressing 
appropriate function keys on the keypad 97 of the mobile 
5 telephone 1-1. These include a language option 99 which 

allows the user to change the language of the caption 69 
that is displayed. This is possible, provided the 
caption file 67 includes captions 69 in different 
languages. As the skilled man will appreciate, this does 

10 not involve any significant processing on the part "of the 

mobile telephone 1-1, since all that is being changed is 
the text of the caption 69 that is to be displayed at the 
relevant timings. It is therefore possible to 
personalise the captions for different users watching the 

15 same film. The options also include an exit option 101 

for allowing the user to exit the captioning application 
being run on the mobile telephone 1-1. 

PERSONAL DIGITAL ASSISTANT 

20 As mentioned above r the PDA 1-2 operates in a similar way 

to the mobile telephone 1-1 except it does not include 
the mobile telephone transceiver circuitry for connecting 
directly to the web server 7. The main components of the 
PDA 1-2 are similar to those of the mobile telephone 1-1 

25 described cibove and will not, therefore, be described 

again . 

REMOTE WEB SERVER 

Figure 3 is a schematic block diagram illustrating the 
30 main components of the web server 7 used in this 

embodiment and showing the captions database 9. As 
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shown, the web server 7 receives input from the Internet 
15 which is either passed to a sliding correlator 121 or 
to a database reader 123, depending on whether or not the 
input is from the mobile telephone 1-1 or from the PDA 
1-2, In particular, the signature from the mobile 
telephone 1-1 is input to the sliding correlator 121 
where it is compared with signature streams of all films 
known to the system, which are stored in the signature 
stream database 125. The results of these correlations 
are then compared to identify the film that the user is 
about to watch. This film ID is then passed to the 
database reader 123. In response to receiving a film ID 
either from the sliding correlator 121 or directly from 
a user device (such as the PC 17 or PDA 1-2), the 
database reader 123 reads the appropriate caption file 
67 and synchronisation file 73 from the captions database 
9 and outputs them to a download unit 127. The download 
unit 127 then downloads the retrieved caption file 67 and 
synchronisation file 73 to the requesting user device 1 
via the Internet 15. 

As those skilled in the art will appreciate, a captioning 
system has been described above for providing text 
captions for a film for display to a user. The system 
does not require any modifications to the cinema or 
playout system, but only the provision of a suitably 
adapted mobile telephone 1^1 or PDA device 1-2 or the 
like. In this regard, it is not essential to add any 
additional hardware to the mobile telephone or the PDA, 
since all of the functionality enclosed in the dashed box 
94 can be performed by an appropriate software 
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application run within the mobile telephone 1-1 or PDA 
1-2 • In this case, the appropriate software application 
may be loaded at the appropriate time, e.g. when the user 
enters the cinema and in the case of the mobile telephone 
5 1-1 f is arranged to cancel the ringer on the telephone 

so that incoming calls do not disturb others in the 
audience. The above captioning system can therefore be 
used for any film at any time. Further, since different 
captions can be downloaded for a film, the system allows 
10 for content variation within a single screening. This 

facilitates, for example, the provision of captions in 
multiple languages. 

Modifications and Alternative Embodiments 

15 In the above embodiment, a captioning system was 

described for providing text captions on a display of a 
portable user device for allowing users with hearing 
disabilities to understand a film being watched. As 
discussed in the introduction of this application, the 

20 above captioning system can be modified to operate with 

audio captions (e.g. audio descriptions of the film being 
displayed for people, with impaired eyesight) . This may 
be done simply by replacing the text captions 69 in the 
caption file 67 that is downloaded from the remote server 

25 7 with appropriate audio files (such as the standard .WAV 

or MP3 audio files) which can then be played out to the 
user via an appropriate headphone or earpiece. The 
synchronisation of the playout of the audio files could 
be the same as for the synchronisation of the playout of 

30 the text captions. Alternatively synchronisation can be 

achieved in other ways. Figure 4 is a block diagram 
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illustrating the main components of a mobile telephone 
that can be used in such an audio captioning system. In 
Figure 4, the same reference numerals have been used for 
the same components shown in Figure 2 a and these 
5 components will not be described again. 

In this embodiment r the mobile telephone 1-1 • does not 
include the signature extractor 49* Instead, as 
illustrated in Figure 5, the signature extractor 163 is 

10 provided in the remote web server 7' . In operation, the 

mobile telephone 1-1' captures part of the audio played 
out at the beginning of the film and transmits this audio 
through to the remote web server 7 • . This audio is then 
buffered in the input buffer 161 and then processed by 

15 the signature extractor 163 to extract a signature 

representative of the audio. This signature is then 
passed to a correlation table 165 which performs a 
similar function to the sliding correlator 121 and 
signature stream database 125 described in the first 

20 embodiment, to identify the ID for the film currently 

being played. In particular, in this embodiment, all of 
the possible correlations that may have been performed 
by the sliding correlator 121 and the signature stream 
database 125 are carried out in advance and the results 

25 are stored in the correlation table 165. In this way, 

the signature output by the signature extractor 163 is 
used to index this correlation table to generate 
correlation results for the different films known to the 
captioning system. These correlation results are then 

30 processed to identify the most likely film corresponding 

to the received audio stream. In this embodiment, the 
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captions database 9 only includes the caption files 67 
for the different films, without any synchronisation 73 
files. In response to receiving the film ID either from 
the correlation table 165 or from a user direct from a 
5 user device (not shown), the database reader 123 

retrieves the appropriate caption file 67 which it 
downloads to the user device 1 via the download unit 127. 

Returning to Figure i, in this embodiment, since the 
10 mobile telephone 1-1' does not include the signature 

extractor 49, synchronisation is achieved in an 
alternative manner. In particular, in this embodiment, 
synchronisation codes are embedded within the audio track 
of the film. Therefore, after the caption file 67 has 
15 been stored in the caption memory 65, the control circuit 

81 controls the position of the switch 143 so that the 
audio signal input into the input buffer 47 is passed to 
a data extractor 145 which is arranged to extract the 
synchronisation data that is embedded in the audio track. 
20 The extracted synchronisation data is then passed to the 

timing controller 85 which controls the timing at which 
the individual audio captions are played out by the 
caption player 147 via the digital-to-analogue converter 
149, amplifier 151 and the headset 153. 

25 

As those skilled in the art will appreciate, various 
techniques Ccux be used to embed the synchronisation data 
within the audio track. The applicant's earlier 
International applications WO 98/32248, WO 01/10065, 
30 PCT/GBOl/05300 and PCT/GBOl/05306 describe techniques for 

embedding data within acoustic signals and appropriate 
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data extractors for subsequently extracting the embedded 
data. The contents of these earlier International 
applications are incorporated herein by reference. 

In the above audio captioning embodiment, synchronisation 
was achieved by embedding synchronisation codes within 
the audio and detecting these in the mobile telephone. 
As those skilled in the art will appreciate, a similar 
technique may be used in the first embodiment. However, 
embedding audio codes within the soundtrack of the film 
is not preferred, since it involves modifying in some way 
the audio track of the film. Depending on the data rates 
involved, this data may be audible to some viewers which 
may detract from their enjoyment of the film. The first 
embodiment is therefore preferred since it does not 
involve any modification to the film or to the cinema 
infrastructure . 

In embodiments where the synchronisation data is embedded 
within the audio, the synchronisation codes used can 
either be the same code repeated whenever synchronisation 
is required or it can be a unique code at each 
synchronisation point. The advantage of having a unique 
code at each synchronisation point is that a user who 
enters the film late or who requires the captions only 
at certain points (for example a user who only rarely 
requires the caption) can start captioning at any point 
during the film. 

In the embodiment described above with reference to 
Figures 4 and 5, the signature extraction operation was 
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performed in the remote web server rather than in the 
mobile telephone. As those skilled in the art will 
appreciate, this modification can also be made to the 
first embodiment described above, without the other 
modifications described with reference to Figures 4 and 
5. 

In the first embodiment described above ^ during the 
tracking mode of operation, the signature extractor only 
processed the audio track during predetermined windows 
in the film. As those skilled in the art will 
appreciate, this is not essentials The signature 
extractor could operate continuously. However, such an 
embodiment is not preferred since it increases the 
processing that the mobile telephone has to perform which 
is likely to increase the power consumption of the mobile 
telephone. 

In the above embodiments, the mobile telephone or PDA 
monitored the audio track of the film for synchronisation 
purposes. As those skilled in the art will appreciate, 
the mobile telephone or PDA device may be configured to 
monitor the video being displayed on the film screen. 
However, this is currently not preferred because it would 
require an image pickup device (such as a camera) to be 
incorporated into the mobile telephone or PDA and 
relatively sophisticated image processing hardware and 
software to be able to detect the synchronisation points 
or codes in the video. Further, it is not essential to 
detect synchronisation codes or synchronisation points 
from the film itself. Another electromagnetic or 
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pressure wave signal may be transmitted in synchronism 
with the film to provide the synchronisation points or 
synchronisation codes. In this case, the user device 
would have to include an appropriate electromagnetic or 
pressure wave receiver. However, this embodiment is not 
preferred since it requires modification to the existing 
cinema infrastructure and it requires the generation of 
the separate synchronisation signal which is itself 
synchronised to the film. 

In the above embodiments, the captions and where 
appropriate the synchronisation data, were downloaded to 
a user device from a remote server. As those skilled in 
the art will appreciate, the use of such a remote server 
is not essential. The caption data and the 
synchronisation data may be pre-stored in memory cards 
or smart cards and distributed or sold at the cinema. 
In this case, the user device would preferably have an 
appropriate slot for receiving the memory card or smart- 
card and an appropriate reader for accessing the caption 
data and, if provided, the synchronisation data. The 
manufacture of the cards would include the steps of 
providing the memory card or smart-card and using an 
appropriate card writer to write the captions and 
synchronisation data into the memory ccurd or into a 
memory on the smart-card. Alternatively still, the user 
may already have a smart-card or memory Ccurd associated 
with their user device which they simply insert into a 
kiosk at the cinema where the captions and, if 
applicable, the synchronisation data are written into a 
memory on the card. 
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As a further alternative, the captions and 
synchronisation data may be transmitted to the user 
device from a transmitter within the cinema. This 
transmission may be over an electromagnetic or a pressure 
wave link. 

In the first embodiment described above, the mobile 
telephone had an acquisition mode and a subsequent 
tracking mode for controlling the playout of the 
captions. In an alternative embodiment, the acquisition 
mode may be dispensed with, provided that the remote 
server can identify the current timing from the signature 
received from the mobile telephone. This may be possible 
in some instances. However, if the introduction of the 
film is repetitive then it may not be possible for the 
web server to be able to provide an initial 
synchronisation . 

In the first embodiment described above, the user devices 
downloaded the captions and synchronisation data from a 
remote web server via the internet. As those skilled in 
the art will appreciate, it is not essential to download 
the files over the internet. The files may be downloaded 
over any wide area or local area network. The ability 
to download the caption files from a wide area network 
is preferred since centralised databases of captions may 
be provided for distribution over a wider geographic 
area. 

In the first embodiment described above, the user 
downloaded captions and synchronisation data from a 
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remote web server. Although not described, for security 
purposes, the caption file and the synchronisation file 
are preferably encoded or encrypted in some way to guard 
against fraudulent use of the captions. Additionally, 
the caption system may be arranged so that it can only 
operate in cinemas or at venues that are licensed under 
the captioning system. In this case, an appropriate 
activation code may be provided at the venue in order to 
"unlock" the captioning system on the user device. This 
activation may be provided in human readable form so that 
the user has to key in the code into the user device. 
Alternatively, the venue may be arranged to transmit the 
code (possibly embedded in the film) to an appropriate 
receiver in the user device. In either case, the 
captioning system software in the user device would have 
an inhibitor that would inhibit the outputting of the 
captions until it received the activation code. Further, 
where encryption is used, the activation code may be used 
as part of the key for decrypting the captions. 

The above embodiments have described text captioning 
systems and audio captioning systems for use in a cinema. 
As those skilled in the art will appreciate, these 
captioning systeias may be used for providing captions for 
any radio, video or multi-media presentation. They can 
also be used in the theatre or opera or. within the user's 
home • 

Various captioning systems have been described above 
which provide text or audio captions for an audio or a 
video presentation. The captions may include extra 
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commentary about the audio or video presentation, such 
as director's comments, explanation of complex plots, 
the names of actors in the film or third party comments. 
The captions may also include adverts for other products 
or presentations. In addition, the audio captioning 
system may be used not only to provide audio descriptions 
of what is happening in the film, but also to provide a 
translation of the audio track for the film. In this 
way, each listener in the film can listen to the film in 
their preferred language. The caption system cain also 
be used to provide karaoke captions for use with standard 
audio tracks. In this case, the user would download the 
lyrics and the synchronisation information which define 
the timing at which the lyrics should be displayed and 
highlighted to the user. 

In addition to the above, the captioning system described 
above may be provided to control the display of video 
captions. For example, such video captions can be used 
to provide sign language (either real images or computer 
generated images ) for the audio in the presentation being 
given . 

In the above embodiments, the captions for the 
presentation to be made were downloaded in advance for 
playout. In an alternative embodiment, the captions may 
be downloaded from the remote server by the user device 
when they are needed. For example, the user device may 
download the next caption when it receives the next 
synchronisation code for the next caption. 
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In the caption system described above, a user downloads 
or receives the captions and the synchronisation 
information either from a web server or locally at the 
venue at which the audio or visual presentation is to be 
made. As those skilled in the art will appreciate, for 
applications where the user has to pay to download or 
playout the captions, a transaction system is preferably 
provided to facilitate the collection of the monies due. 
In embodiments where the captions are downloaded from a 
web server, this transaction system prefered^ly forms part 
of or is associated with the web server providing the 
captions. In this case, the user can provide electronic 
payment or payment through credit card or the like at the 
time that they download the captions. This is preferred, 
since it is easier to link the payment being made with 
the captions and synchronisation information downloaded. 

In the first embodiment described above, the ID for the 
film was automatically determined from an audio signature 
transmitted from the user's mobile telephone. 
Alternatively, instead of transmitting the audio 
signature, the user can input the film ID directly into 
the telephone for transmission to the remote server. In 
this case, the correlation search of the signature 
database is not essential. 

In the first embodiment described above, the user device 
processed the received audio to extract a signature 
characteristic of the film that they are about to watch. 
The processing that is preferred is the processing 
described in the Shazam Entertainment Ltd patent 
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mentioned above. However, as those skilled in the art 
will appreciate, other types of encoding may be 
performed. The main purpose of the signature extractor 
unit in the mobile telephone is to compress the audio to 
generate data that is still representative of the audio 
from which the remote server can identify the film eibout 
to be watched. Various other compression schemes may be 
used. For example, a GSM codec together with other audio 
compression algorithms may be used. 



In the eUsove embodiments in which text captions are 
provided, they were displayed to the user on a display 
of a portcible user device. Whilst this offers the 
simplest deployment of the captioning system, other 

15 options are available. For example, the user may be 

provided with an active or passive type head-up-display 
through which the user can watch the film and on which 
the captions are displayed (active) or are projected 
(passive) to overlay onto the film being watched. This 

20 has the advantage that the user does not have to watch 

two separate displays. A passive type of head-up-display 
can be provided, for example, by providing the user with 
a pair of glasses having a beam splitter (e.g. a 45° 
prism) on which the user can see the cinema screen and 

25 the screen of their user device (e.g. phone or PDA) 

sitting on their lap. Alternatively, instead of using 
a head-up-display, a separate transparent screen may be 
erected in front of the user's seat and onto which the 
captions are projected by the user device or a seat- 

30 mounted projector. 
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In the first embodiment described above, the caption file 
included a time ordered sequence of captions together 
with associated formatting information and timing 
information. As those skilled in the art will 
5 appreciate, it is not essential to arrange the captions 

in such of time sequential order. However, arranging 
them in this way reduces the processing involved in 
identifying the next caption to display. Further, it is 
not essential to have formatting information in addition 

10 to the caption* The minimum information required -is the 

caption information. Further, it is not essential that 
this be provided in a file as each of the individual 
captions for the presentation may be downloaded 
separately. However, the above described format for the 

15 caption file is preferred since it is simple and can 

easily be created using, for example, a spreadsheet. 
This simplicity also provides the potential to create 
a variety of different caption content. 

20 In embodiments where the user's mobile telephone is used 

to provide the captioning, the captioning system can be 
made interactive whereby the user can interact with the 
remote server, for example interacting with adverts or 
guestionaries before the film stcurts. This interaction 

25 can be implemented using, for example, a web browser on 

the user device that: receives URLs and links to other 
information on websites. 

In the first embodiment described above, text captions 
30 were provided for the audio in the film to be watched. 

These captions may include full captions, subtitles for 
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the dialogue only or subtitles at key parts of the plot. 
Similar variation may be applied for audio captions. 



